UNISON: Unpaired Cross-Lingual Image Captioning

نویسندگان

چکیده

Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image relies on paired image-caption datasets train the model a supervised manner. However, creating such for every target language is prohibitively expensive, which hinders extensibility technology and deprives large part world population benefit. In this work, we present novel unpaired cross-lingual method generate captions without relying any caption corpus source or language. Specifically, our consists two phases: (1) auto-encoding process, utilizing sentence parallel (bitext) learn mapping from scene graph encoding space decode sentences language, (2) cross-modal unsupervised feature mapping, seeks map encoded features modality modality. We verify effectiveness proposed Chinese generation task. comparisons against several existing methods demonstrate approach.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unpaired Image Captioning by Language Pivoting

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this ...

متن کامل

Cross-Lingual Image Caption Generation

Automatically generating a natural language description of an image is a fundamental problem in artificial intelligence. This task involves both computer vision and natural language processing and is called “image caption generation.” Research on image caption generation has typically focused on taking in an image and generating a caption in English as existing image caption corpora are mostly ...

متن کامل

Cross-Lingual Image Search on the Web

Most people locate images on the Web by querying image search engines such as Google’s. The images are tagged by the words in their “vicinity”, which limits the ability of a searcher to retrieve them. Although images are universal, an English searcher will fail to find images tagged in Chinese, and a Spanish searcher will fail to find images tagged in English. Cross-lingual homonyms cause probl...

متن کامل

Phrase-based Image Captioning

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representat...

متن کامل

Domain-Specific Image Captioning

We present a data-driven framework for image caption generation which incorporates visual and textual features with varying degrees of spatial structure. We propose the task of domain-specific image captioning, where many relevant visual details cannot be captured by off-the-shelf general-domain entity detectors. We extract previously-written descriptions from a database and adapt them to new q...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i10.21310